On the (Statistical) Detection of Adversarial Examples

نویسندگان

Kathrin Grosse

Praveen Manoharan

Nicolas Papernot

Michael Backes

Patrick D. McDaniel

چکیده

Machine Learning (ML) models are applied in a variety of tasks such as network intrusion detection or malware classification. Yet, these models are vulnerable to a class of malicious inputs known as adversarial examples. These are slightly perturbed inputs that are classified incorrectly by the ML model. The mitigation of these adversarial inputs remains an open problem. As a step towards a model-agnostic defense against adversarial examples, we show that they are not drawn from the same distribution than the original data, and can thus be detected using statistical tests. As the number of malicious points included in samples presented to the test diminishes, its detection confidence decreases. Hence, we introduce a complimentary approach to identify specific inputs that are adversarial among sets of inputs flagged by the statistical test. Specifically, we augment our ML model with an additional output, in which the model is trained to classify all adversarial inputs. We evaluate our approach on multiple adversarial example crafting methods (including the fast gradient sign and Jacobian-based saliency map methods) with several datasets. The statistical test flags sample sets containing adversarial inputs with confidence above 80%. Furthermore, our augmented model either detects adversarial examples with high accuracy (> 80%) or increases the adversary’s cost—the perturbation added—by more than 150%. In this way, we show that statistical properties of adversarial examples are essential to their detection.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generalizable Adversarial Examples Detection Based on Bi-model Decision Mismatch

Deep neural networks (DNNs) have shown phenomenal success in a wide range of applications. However, recent studies have discovered that they are vulnerable to Adversarial Examples, i.e., original samples with added subtle perturbations. Such perturbations are often too small and imperceptible to humans, yet they can easily fool the neural networks. Few defense techniques against adversarial exa...

متن کامل

Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality

Deep Neural Networks (DNNs) have recently been shown to be vulnerable against adversarial examples, which are carefully crafted instances that can mislead DNNs to make errors during prediction. To better understand such attacks, a characterization is needed of the properties of regions (the so-called ‘adversarial subspaces’) in which adversarial examples lie. In particular, effective measures a...

متن کامل

HyperNetworks with statistical filtering for defending adversarial examples

Deep learning algorithms have been known to be vulnerable to adversarial perturbations in various tasks such as image classification. This problem was addressed by employing several defense methods for detection and rejection of particular types of attacks. However, training and manipulating networks according to particular defense schemes increases computational complexity of the learning algo...

متن کامل

Hypernetworks with Statistical Filtering for Defending Adversarial Examples

متن کامل

Adversarial Deep Learning for Robust Detection of Binary Encoded Malware

Malware is constantly adapting in order to avoid detection. Model based malware detectors, such as SVM and neural networks, are vulnerable to so-called adversarial examples which are modest changes to detectable malware that allows the resulting malware to evade detection. Continuous-valued methods that are robust to adversarial examples of images have been developed using saddle-point optimiza...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1702.06280 شماره

صفحات -

تاریخ انتشار 2017

On the (Statistical) Detection of Adversarial Examples

نویسندگان

چکیده

منابع مشابه

Generalizable Adversarial Examples Detection Based on Bi-model Decision Mismatch

Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality

HyperNetworks with statistical filtering for defending adversarial examples

Hypernetworks with Statistical Filtering for Defending Adversarial Examples

Adversarial Deep Learning for Robust Detection of Binary Encoded Malware

عنوان ژورنال:

اشتراک گذاری